A New Support Measure for Items in Streams
نویسندگان
چکیده
Mining streams is a challenging problem, because the data can only be looked at once, and only small summaries of the data can be stored. We present a new frequency measure for items in streams that does not rely on a fixed window length or a time-decaying factor. Based on the properties of the measure, an algorithm to compute it is shown. Experimental evaluation supports the claim that the new measure can be computed from a summary with very small memory requirements, that can be maintained and updated efficiently. In this extended abstract, the main points of the presentation are discussed. 1 Motivation Mining frequent items over streams received recently a lot of attention. It presents interesting new challenges over traditional mining in static databases. It is assumed that the stream can only be scanned once, and hence if an item is passed, it can not be revisited, unless it is stored in main memory. Storing large parts of the stream, however, is not possible because the amount of data passing by is typically huge. Different models have already been proposed in literature. The main characteristic is: how must the frequency of an item be measured? There are different types of models. (1) the sliding window model, (2) the time-fading model, or (3) the landmark model. In the sliding window model [1, 3, 6, 8, 10], only the most recent events are used to determine the frequency of an item. In order to avoid having to count the supports on this window all over again in every time point, the algorithm in fact updates the frequency of the items based on the deletion of some transactions and the insertion of other. In the time-fading model, the past is still considered important, but not as important as the present. This is modelled by gradually fading away the past [9]. That is, there is, e.g., a ∗The presentation is based on material presented in the ECML/PKDD’06 workshop International Workshop on Knowledge Discovery from Data Streams (IWKDDS) [2].
منابع مشابه
A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملApplication of genetic algorithm (GA) to select input variables in support vector machine (SVM) for analyzing the occurrence of roach, Rutilus rutilus, in streams
Support vector machine (SVM) was used to analyze the occurrence of roach in Flemish stream basins (Belgium). Several habitat and physico?chemical variables were used as inputs for the model development. The biotic variable merely consisted of abundance data which was used for predicting presence/absence of roach. Genetic algorithm (GA) was combined with SVM in order to select the most important...
متن کاملA New Mathematical Model for the Prediction of Internal Recirculation in Impinging Streams Reactors
A mathematical model for the prediction of internal recirculation of complex impinging stream reactors has been presented. The model constitutes a repetition of a series of ideal plug flow reactors and CSTR reactors with recirculation. The simplicity of the repeating motif allows for the derivation of an algebraic relation of the whole system using the Laplace transform. An impinging stream...
متن کاملبررسی دلایل موثر بارداری مجدد مادران کودکان مبتلا به تالاسمی ماژور در مراجعین به درمانگاههای خون بیمارستانهای وابسته به وزارت بهداشت درمان و آموزش پزشکی سال 1371
A causal – comparative study was carried on mothers of children with thalassemic major in Tehran. The purpose of the study was to identify factors affecting the family's decision for having a second child and disregarding the fact that this disorder may be carried to the new baby as well. A total of 300 mothers having children with thalassemia major attending the haematology clinic supervis...
متن کاملPriority Setting Meets Multiple Streams: A Match to Be Further Examined?; Comment on “Introducing New Priority Setting and Resource Allocation Processes in a Canadian Healthcare Organization: A Case Study Analysis Informed by Multiple Streams Theory”
With demand for health services continuing to grow as populations age and new technologies emerge to meet health needs, healthcare policy-makers are under constant pressure to set priorities, ie, to make choices about the health services that can and cannot be funded within available resources. In a recent paper, Smith et al apply an influential policy studies framework – Kingdon’s multiple str...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007